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Abstract 

Background: Cellular life with complex metabolism probably evolved during the reign of RNA, when it served as 
both information carrier and enzyme. Jensen proposed that enzymes of primordial cells possessed broad 
specificities: they were generalist. When and under what conditions could primordial metabolism run by generalist 
enzymes evolve to contemporary-type metabolism run by specific enzymes? 

Results: Here we show by numerical simulation of an enzyme-catalyzed reaction chain that specialist enzymes 
spread after the invention of the chromosome because protocells harbouring unlinked genes maintain largely 
non-specific enzymes to reduce their assortment load. When genes are linked on chromosomes, high enzyme 
specificity evolves because it increases biomass production, also by reducing taxation by side reactions. 

Conclusion: The constitution of the genetic system has a profound influence on the limits of metabolic efficiency. 
The major evolutionary transition to chromosomes is thus proven to be a prerequisite for a complex metabolism. 
Furthermore, the appearance of specific enzymes opens the door for the evolution of their regulation. 

Reviewers: This article was reviewed by Sandor Pongor, Caspar Jekely, and Rob Knight. 
Keywords: Origin of life. Chromosome, Metabolism, Ribozyme, Major transitions. Enzyme evolution 



Background 

The major evolutionary transitions [1] set a timeline 
onto which other evolutionary milestones can be inte- 
grated. The emergence of complex metabolism in the 
RNA world [2-4] (an age when RNA served as both 
information carrier and enzyme) is one such mile- 
stone, whose place in the order of events has not yet 
been determined. Some rudimentary metabolism could 
have existed on mineral surfaces [5], where RNA oligo- 
mers can also form [6]. Template-based replication of 
these oligomers was achieved at this stage, which trans- 
formed RNA molecules into units of evolution. These 
independent replicators became compartmentalized dur- 
ing the first major evolutionary transition [1], and by their 
very nature, possessed at least the ability to enhance their 
own formation. A good fraction of early ribozymes 
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(RNA enzymes) was likely to have been inefficient 
generalists [7], as it must have taken time to optimize their 
function. Furthermore, the ever-changing and unpredict- 
able primordial environment probably favored broad speci- 
ficities and the ability to adapt to new substrates [8]. By the 
invention of translated protein synthesis [1], a complex me- 
tabolism was likely in place. We can conclude that a metab- 
olism driven by specialist enzymes is likely to have emerged 
in the RNA world [2], before the invention of the genetic 
code and translation. 

Evolution of complex metabolism requires that enzymes 
be able to evolve from one function to another; and be 
able to reach high rate enhancement and specificity. The 
plethora of artificially evolved ribozymes [3,4,9] testify that 
RNA is well capable of acquiring novel catalytic functions. 
Furthermore, evolution can lead from one enzyme func- 
tion to another (e.g. the Bartel I ligase that was turned into 
an RNA polymerase [10,11]). RNA enzymes are capable of 
very specific catalysis with potentially high catalytic rate 
enchantment [12]. Thus there is no biochemical reason 
for not having specific enzymes rather soon after the 
appearance of ribozymes. The possibility of division of 
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labor and evolution of specialist enzymes has also 
been demonstrated in theoretical studies of surface 
metabolism [13] and compartmentalized systems [14], 
however only for a few enzyme specificities and 
without modeling of enzyme-substrate interactions. 
Another theoretical investigation, however, found limited 
evolution towards specialist enzymes [15]. 

The question we address in this paper is whether spe- 
cialist enzymes evolved before or after the establishment 
of chromosomes. 

Methods 

We follow the evolution of enzyme specificities acting 
on a linear series of reactions (Figure la) with a model 
similar to that used in the pioneering study of Kacser 
and Beeby [15]. Substrates and enzymes are represented 
by hyper-blocks and cavities of D dimensions, and both 
have functional groups at each face (an important deviation 
from the cited model) (Figure lb). The fit between the sub- 
strate and the enzyme determines catalytic activity, while 
the match between the functional groups determines 
whether the reaction produces a component in the biomass 
producing line of reaction (Figure lb, top), a waste product 
(Figure lb, bottom right) or nothing (Figure lb, 
bottom left). More specifically, the enzymatic activity 
is calculated from the binding energy of the enzyme- 
substrate complex using the Lennard-Jones equation: 

e = 2Y^^_^(l/Rf where Rj is the distance 

between the wall of the cavity and the face of the 
centered substrate (see Figure lb). We assume that 
the catalytic activity is proportional to this energy: e 
= - s (thermodynamics would require the natural 
logarithm of this, but since the function remains monot- 
onous we neglected it for speeding up the rate of evolution, 
thus reducing simulation time). Each reaction step / (both 
forward and waste) is treated as a simplified Michaelis- 
Menten step with unsaturated enzymes, thus the flux to- 
ward biomass accumulation is Ii = ei{[Xi_i\ - [XJ) and the 
flux toward waste is Ii = 0.5e/[X/_ i]. We further assume that 
[Xq] = 1, [Xfinai] = 0 and [ W^] ~ 0 (assuming that waste pro- 
ducts diffuse quickly, and it cannot be converted back to 
non- waste product). Coupled with the equations ensuring 
conservation of flux (// = + 1 + 7^ + 1), any of the fluxes can 
be computed. The flux of the last catalytic reaction (/final) 
determines the rate of biomass accumulation, which in turn 
translates to rates of protocell division (upon reaching a 
certain number of ribozymes or total flux). As this flux 
determines the replication rate of the protocell, it serves as 
a measure of fitness. Detailed derivation of the model is 
found in the Appendix. 

The population dynamics of the protocefl foflows a 
Moran process, i.e. when a protocefl divides one of the 
daughter cefls replaces the original protocefl, and the 



other replaces a randomly chosen protocefl from the 
population. By using this update rule we assume that the 
population size is constant (A/^=5000). Protocells divide 
when the cumulative flux of the system reaches a 
threshold {Cf^^ = 100). Upon replication of the genes 
mutations can occur in the genes of the protocell. 
Each sides of the enzyme is altered by a random num- 
ber obtained from a normal distribution with mean 0 
and o = 0.05 standard deviation. 

We implemented three separate versions of the model: 
in one (version 1), ribozymes replicate individually, in 
the second (version 2) chromosomes sometimes form, 
but genes mostly replicate individually; and in the third 
(version 3), genes are permanently linked together in a 
chromosome. In the presence of a chromosome the cu- 
mulative flux of the system needs to reach a threshold 
in order for the protocefl to divide (see above). In the in- 
dividually replicating ribozymes case (version 1), when- 
ever the cumulative flux exceeds a value (Cf = 6.7) a 
new replicator is added to the protocefl tifl the number 
of independent replicators reaches the threshold {rf^^^^ =15) 
value. The protocells divide at the same cumulative flux as 
in the chromosome case, as Cj^^^ = rf"^^^ • C?^*^. In version 2, 
the total number of genes need to reach the threshold is 
rf^^^ =15, irrespective of them being individually present or 
linked into a chromosome. Here, independent ribo- 
zymes replicate if the cumulative flux exceeds the 
value C?*^*^ = 6.7, and chromosomes replicate if the cu- 
mulative flux exceeds the same value times the num- 
ber of genes in the chromosome. The new replicator 
is produced by copying and mutating a randomly 
chosen ribozyme present within the protocell. In the 
2^*^ version of the model there is a 10'^ chance at each time 
step that the genes form a chromosome, and will replicate 
together from that point of time. At cell division, the gen- 
etic materials are divided among the daughter protocells. 
Either both of them gain one copy of the chromosome (ver- 
sion 3), or each ribozyme (version 1 and 2) or chromosome 
(version 2) is randomly assigned to one of the daughter 
protocells. 

Initially aU protoceUs are identical, and aU ribo- 
zymes are totally generalist (as a worst-case assumption), 
i.e. they are large enough to fit onto every substrate. In the 
1^* version of the model, the protocells initially harbor as 
many ribozymes as there are reactions. In the 2^*^ version 
of the model, all genes start as individually replicating and 
there are no chromosomes in any of them. We foUowed 
the evolutionary dynamics till equilibrium. 

Results and discussion 

The three versions of the model represent three stages 
of chromosome evolution. The initial phase of no chro- 
mosomes (version 1), the transitional stage when genes 
can link up to chromosomes, but assortment to 



Szilagyi et at. Biology Direct 2012, 7:38 
http://www.biology-direct.eom/content/7/1/38 



Page 3 of 10 



W 





X- 




w 




w 



X, 




small size, 
group match 




optimal size, 
group match 






exclusion 



optimal size, 
group mismatch 



2< 






Figure 1 a. Reaction scheme. X's represent the products, W's the waste products. The final flux (in this five step reaction l^) is the fitness of a 
given cell. b. Relative orientation of enzyme-substrate complex. If the functional groups totally match and the distance between the faces (/?/) are 
at the optimal value of the van der Waals interaction (/?opt, marked by dashed line) the conversion has the highest activity {top right). If the 
enzyme is larger the conversion yield reduces {top left), if the enzyme is too small the substrate is sterically excluded {bottom left). If the functional 
groups differ in two functional groups the enzyme catalyses waste production {bottom right), c. independently replicating genes and random 
assortment to daughter cells (specialist enzymes in green and red); d. generalist enzymes; and e. protocell with a chromosome and accurate 
segregation mechanism attached to cell boundary (blue). Fitter protocells are marked by thicker boundaries. 
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daughter cells is still random (version 2), and the final 
stage of a fully formed chromosome with exact mechan- 
ism of distributing one copy to each daughter cell (ver- 
sion 3). 

We get markedly different results for the three ver- 
sions of our model, which only differ in how genes are 
assorted into daughter cells. Specialized enzymes do not 
evolve in protocells with individually replicating genes 
and random assortment (no chromosome case) even if 
there are only three reactions to catalyze (Figure 2a). 
Ribozymes remain generalist and a considerable number 
of side reactions can be observed. In contrast, protocells 
with chromosomes evolve toward fully specialist enzymes 
that efficiently channel flux toward biomass accumulation 
(Figure 2c). The inability of individually segregating replica- 
tors to become fully specialized rests on the high probabil- 
ity of losing a gene at protocell division (the assortment 
load). Protocells with less specific enzymes reduce the 
assortment load by maintaining metabolism at moderate 
efficiency. There is an intermediate evolutionary stage con- 
necting these two cases, as demonstrated by the results of 



the 2nd version of our model. Here the system first evolves 
to the specificity exhibited in the no chromosome case 
(Figure 2b). Only then could chromosomes form, because a 
newly formed chromosome can decrease the assortment 
load and give rise to a fially specialized system (compare 
Figure 2b and c). It is worth mentioning that in the 
intermediate system there is stfll some assortment 
load, as the chromosomes still assort randomly. Thus 
a beneficial mutation appearing in one chromosome, 
and helping the cell to divide faster might not get 
into one of the daughter cells. This, however, does 
not affect the end result, which is fuU speciaUzation. 

The invention of the chromosome allowed many specia- 
lized enzymes to evolve from initially generalist enzymes. 
We demonstrate that specific enzymes also evolve for 
longer reaction chains (Figure 3). Here a reaction chain 
consisting of five reactions is considered. Evolution quickly 
eliminates most of the catalysis toward unproductive side 
reactions, but roughly nine waste directions remain in the 
system for a long time (Figure 3a). From here, enzyme 
evolution slows down considerably, although in the end all 
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Figure 2 Number of waste catalytic direction in a 3-step reaction networic as a function of time in protocells with (a) individually 
replicating genes and random assortment; (b) potential for chromosome formation and random assortment; and (c) in protocells with 
chromosomes. Each line (black, red and green) shows the number of waste catalytic directions of the enzymes. For combinatorial and 
geometrical reasons the maximum number of waste catalytic direction in 3 dimensions is 9. (d) Change of frequency of cells with chromosome 
through time. The dynamic of the population is followed for 10^ time steps, and the first 5-10^ or 5-10^ is shown with data recorded at every 10^ 
time steps. The sizes of the three substrates are given by cyclic permutation of the sizes 1.0, 2.5 and 4.0 (1.0x2.5x4.0, 2.5x4.0x1.0 and 
4.0x1.0x2.5). Initially all enzymes has size of 7.0x7.0x7.0, which is large enough to fit onto every substrate. 
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Figure 3 Evolution of (a) the number of waste catalytic directions (color lines correspond to the population average of the five 
enzymes of the system with chromosomes; black line shows the average number of waste catalytic directions in case of independent 
replicators) and (b) the flux (both direct and waste) in the 5-step reaction network. Both fluxes are normalized (the unity of the direct flux 
corresponds to the optimal enzyme configuration). The direct flux fails to reach its maximal value due to recurrent mutations, (c) The stars 
represent the catalytic activities of the five enzymes relative to the maximal value represented by the circle at every 5-10^ time step, beginning at 
2-10^. The sizes of the five substrates are 1.0x2.5x4.0x5.5x7.0, and the cyclic permutation of these sizes. Initially all enzymes is a hypercube with a 
side of 9.0. These cubes are large enough to fit onto every substrate. 



enzymes find the optimal shape (Figure 3a). In the begin- 
ning the flux toward unproductive side reactions is nearly 
4 orders of magnitude larger than toward biomass, thus 
there is a strong selection for preventing unproductive side 
reactions. As a result, the flux toward biomass increases by 
3 orders of magnitude, although flux toward the waste 
decreases only slightly (Figure 3b). The enzymes evolve to a 
shape that excludes most of the side reactions (hence the 
increase in flux toward biomass), but at the same time 
increases activity toward a few of them (hence the only 
slight decrease in flux toward waste). A marked drop in the 
flux toward waste can only be observed when the last of 
the waste directions is eliminated. In the end, flux toward 
biomass accumulation increases by four magnitudes, while 
flux toward the waste directions decerases by three magni- 
tudes resulting in a ca. 10^ enhancement in specificity. 

The longer the reaction chain the longer it takes 
evolution to find the optimal solution (compare the 
time scales in Figure 2c and Figure 3a), but the solu- 
tion is always found. For this reason the above obser- 
vation can be extended to arbitrary reaction chain length 
(we have also obtained results for chain length of 6, data 
not shown) and different topologies, as there is nothing to 
suggest that the same mechanism could not work for 
longer reaction sets and more complex networks. How- 
ever, longer reaction chains are computationally more 
demanding, and it quickly becomes unfeasible to follow as 
the number of reaction steps increases. 



Our results are robust to the details of the model: 
changing the mutational variance or the redundancy of 
ribozymes within the protocells, or the introduction of 
fluctuating inflow of starting material, do not change 
our results in a qualitative manner. 

Assortment load can be lowered by increasing the in- 
ternal copy number of replicators (^*^^^^), although it also 
entails fitness costs (in terms of energy and speed). 
There is an increase of the attainable equilibrium flux of 
the last catalytic step by more efficient reduction of cata- 
lytic enhancement toward waste directions with the in- 
crease of n"^^^^; the increase is very modest (Figure 4). 
While assortment load is known to vanish at n^^^^ oo 
[16], however, this ideal state is approached very slowly, 
and for realistic numbers of internal molecules the 
attainable final flux remains well below the optimal flux 
observed in protocells with a chromosome. We can 
thus conclude that within a reasonable range of n^^^^ 
values, our qualitative result holds: full specialization 
could not be reached even if each replicator was in a 
great excess (Figure 4). 

Our choice of mutational variance, a = 0.05, allows con- 
vergence to equilibrium in reasonable time and at the same 
time it keeps fluctuation due to stochasticity down. Smaller 
mutational variance increases the time it takes to reach 
equilibrium and the final flux tends to its optimal value 
(unity), while a higher mutational variance decreases the 
mean flux toward biomass accumulation in a fully 
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Figure 4 The normalized flux at equilibrium as the function of 
the number of enzymes at protocell division (n*^"^). Other 
parameters as in Figure 2. 



specialized population (Figure 5). Thus, our choice of muta- 
tional variance does not affect our qualitative outcomes. 

We have checked the case when catalytic activity was 
proportional to the natural logarithm of the binding en- 
ergy: e = - In £, as dictated by thermodynamics. Results 
for reaction chain length of 3 show qualitatively the 
same results as results without the logarithm (data not 
shown). 

We conclude that our results are robust, and the same 
qualitative outcome can be observed with modified ver- 
sions of the model and/or in a vast area of the parameter 
space. Accordingly, it is important to understand why 
Kacser and Beeby have not achieved the stage of nearly 
complete enzyme specificity [17,18], despite assuming 
that genes sit on chromosomes. There are three crucial 
differences: we count with more than 3 dimensions for 
enzyme-substrate fit increasing the potential of full spe- 
ciality; we consider functional group identity; and as a 



consequence we allow for harmful side reactions. They 
calculated with active centre boxes only: it is easy to see 
that in three dimensions one cannot evolve fully specific 
enzymes for a linear pathway of 8 reactions. It is thus 
not surprising that they found mere partitioning of cata- 
lytic task space {sensu Kauffman [19]) without attaining 
high specificity. Furthermore, this partitioning allowed 
for historically contingent end states, which they indeed 
found to happen. 

Our results suggest that chromosome formation pre- 
ceded complex metabolism run by specific enzymes, but 
they do not suggest that no specific enzyme could form. 
We have set each of our reactions equally important, 
but none needed to be specific in order to function 
(albeit higher specificity bestowed a fitness advantage). 
The system with independently replicating genes evolved 
to a stage in which the opposing selective forces favoring 
fewer genes because of the assortment load, and higher 
efficiency due to specificity cancelled each other out. How- 
ever, we know that specific enzymes (i.e. two enzymes that 
are both required for a functional cell) can coexist despite 
the assortment load [20]. Certain cellular functions might 
require highly efficient and/or specific enzymes. The two 
are not necessary the same. For example, a replicase needs 
to be efficient (see below), but should at the same time be 
a generalist in the sense that it should be able to replicate 
any sequence. We hypothesize that a few specific and a 
larger number of generalist enzymes could have coexisted 
before the evolution of the chromosome. 

Linkage of genes and complexity and specificity of 
metabolism coevolved. Maynard Smith and Szathmary 
have demonstrated that the chromosome can evolve 
by genes linking together and outcompeting the cells 
with independently replicating genes [21], which we 
have also shown. In our model, linkage went to fix- 
ation only after specificity reached the level attainable 
in a system with independently replicating genes, even 
though chromosome-harboring cells appeared earlier. 
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Figure 5 The evolution of mean flux (both direct and waste) in a 3-step reaction network. Botli fluxes were normalized (the unity 
corresponds to the optimal enzyme configuration). Mean flux is determined every 10^ time steps. The model was run for 5-10^ time steps, results 
for the first 2-10^ time steps are shown. All replicate populations reached the fully specialized state in equilibrium. Other parameters are as in 
Figure 2. 
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but these were competed out. Thus we show that gen- 
etic representation and metaboUsm coevolve. Our sim- 
ple model cannot capture all the necessary ingredients 
of the evolution of the chromosome, for example the 
extra enzymatic functions required [22]. Two novel 
functions need to evolve: an RNA endonuclease en- 
zyme that liberates the ribozymes from the chromo- 
some, and some way to attach the chromosome to 
the cell boundary, so that the growth of the boundary 
can help separate the two copies. The first enzymatic 
function is straightforward: all extant ribozymes cleave 
RNA [23] and the simple structural motifs exhibited 
by the hairpin or the hammerhead ribozymes are 
common even in random pools of short RNAs [24]. 
Moreover, an enzyme that can cleave RNA is often 
also proficient in ligating them, a function which is 
essential for the formation of the chromosome, al- 
though chromosomes could have emerged by recom- 
bination as well. For the second function, chromosome 
separation, something that attached the chromosome to 
the cell wall is required (assuming that the cell has a cell 
wall, like most prokaryotes do) [22]. This linkage could be a 
small peptide. 

Conclusion 

Our results demonstrate that a highly specific enzyme 
set is unlikely to evolve before the invention of chro- 
mosomes. The appearance of chromosomes is made 
possible by considerable increase in the fidelity of 
replication, as the amount of the genetic information, 
and thus the number of different enzymes a protocell 
could have had, is limited by the fidelity of the copy- 
ing process [25]. For example, the 99.4% copying fi- 
delity exhibited by the putative replicase ribozyme 
[10] would allow for a genome having roughly 1,200 
nucleotides [26], still nearly a magnitude less than 
estimated for a minimal ribo-organism [26]. In order 
to overcome this error threshold the genetic informa- 
tion needs to be maintained as individual replicators 
[20,27]. However, when replicators replicate individu- 
ally then there is intragenomic conflict [25], as the 
fastest replicator tends to dominate the system, 
thereby causing the loss of other replicators, and thus 
information. This internal conflict can be suppressed in 
a small, randomly assorted population of compartmen- 
talized replicators, where the stochastic nature of segre- 
gation to daughter protocells upon division can, 
through the generation of a more equally distributed 
gene set, ensure the maintenance of the full diversity of 
the original set of enzymes [20]. However, random as- 
sortment sets another error threshold: the number of 
different replicators that can be maintained is limited 
by the total number of replicators present. The fidelity 
of the replication process as well as the control 



mechanisms that guide the segregation of the chromo- 
some evolved at this stage of the origin of life. Diversi- 
fied, complex metabolism evolved afterwards. 

How diversified and complex the minimal metabolism 
was is still debated [28,29], but a figure around 200 
genes emerges as the minimum for a DNA-peptide or- 
ganism. This figure, however, contains all the genes for 
translation and also for DNA replication, functions that 
did not exist when the chromosome evolved [1]. The 
minimal gene set suggested by comparison of bacterial 
genomes [30] includes 95-96 genes for translation, 
nearly half of the suggested minimal set of 206 genes 
[30]. Furthermore, there are 15 genes involved in other 
protein related functions and 16 genes for DNA replication 
and other DNA related functions (repair, modification, re- 
striction). Thus a ribo-organism could function with less 
than a 100 genes. The minimal intermediate metabolism is 
suggested to require 50 enzymes [31]. An RNA-dependent 
RNA polymerase is required, and if it does not also posses 
helicase activity, then a separate enzyme for that function is 
also required (2 genes). We should also include 2 genes for 
RNA degradation, 1 for cell division, and 4 involved in 
transport [30]. This gives us an estimate of around 60 
genes. Considering that, strictly speaking, this is 60 func- 
tions and not 60 genes, the final figure can even be less as 
generalist enzymes can catalyse more than one of the pro- 
posed reactions. This set of enzymes is supposed to be 
present already at the stage of independent replicators. 

A further ingredient of the evolution of increasing en- 
zyme specificity could have been the advantage gained 
from metabolic regulation. In an unregulated metabolism, 
cross-catalysis might be neutral or even beneficial (forget- 
ting side reactions for a moment), but if the cell wants to 
down-regulate enzyme A that converts substrate a, 
because the pathway is temporarily not needed, it can 
easily mean that the conversion rate of some other 
substrates, say p and will also diminish. Regulation 
in general makes sense only with specific targets, A fu- 
ture goal is simulation of the coevolution of protocell 
metabolic network and enzymes, using artificial chem- 
istry [32], which in all likelihood will generate further 
insight into protocell evolution in general, including 
membrane-metabolism coevolution [33] that may have 
led from completely heterotrophic protocells [34] to 
cells with a rich internal metabolic network. 

Appendix 

Calculation of the flux 

Enzyme- substrate interaction. We assume the enzyme 
active sites as cubic cavities of D dimensions; substrates 
are D dimensional cubes and the activated complex as a 
cube centered in the cavity (with parallel faces). The 
interaction acts between the 2D pieces of D-l dimen- 
sional faces of the cavity and the cube. Thus the steric 
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interaction energy of the enzyme- substrate complex can 
be calculated by the van der Waals formula: 




where Rj is the distance between the yth inner wall of 
the enzyme and the corresponding face of the substrate 
(see Figure lb). In our simulations, following Kacser 
and Beeby [17] we used A=l and B=10, 

The catalytic activity of an enzyme depends only on the 
van der Waals interaction between enzyme and substrate 
and can be approximated as the logarithm of this energy. 

e = ln(— e), 

where we fixed the proportionality to one. In our simula- 
tions to speed up the convergence we used simple propor- 
tionality between interaction energy and catalytic activity: 



We believe that this simplification is not qualitatively 
significant for our results. In case of non perfect func- 
tional group matching (i.e. waste catalytic directions) we 
assume a 0.5 penalization factor. Note that the optimal 
catalytic activity (which is independent of the size of the 
substrate) is: 



-D. 



Flux of a branched chain of reactions with unsaturated 
enzymes. Let us assume an L-step chain of reaction with 
waste products, see Figure 1 for L=5. Each reaction step 
(both forward and waste) is treated as a simplified 
Michaelis-Menten step with unsaturated enzymes. In 
this case the flux of an elementary reaction step is 

where [S] and [P] is the substrate and product concen- 
tration, respectively. To get the final flux we should 
introduce some more simplifications. We assume a 
large, well-mixed reactor and the waste concentration 
is treated as zero: [Wi] ~ 0. If one holds the initial 
and final substrate concentrations constant ([Xq] = 1, 
[Xfinai] = 0) we obtain the following linear system for 
the simplified kinetics 



^ ei{[Xi- 



{i=l,2,. 



= 1,2,. ..,L) 
■,L) 



for the waste and intermediate fluxes, and 



(/= 1,2... L-1) 



for the flux conservation. The final flux /fmai (i-e. the 
fitness) can be easily computed from this set of equations. 



For L-3 the set of equations defines the direct and 
waste fluxes: 



/i =ei(l- [Xi]) 

h = eil 
h = e2{[Xi 



I, = e,[X,] 
h = e^{[X2] 
I, = e,[X2] 

h=h+h 



0) 



and the final flux I3 is the following: 



{ei + ^2 + e2)(e3 + 63) + 62(^1 + 62) 

An important special case is when all direct catalytic ac- 
tivities have the same value (e.g. their common maximal 
value e% see previous section) and the waste activities 
are all zero. In this case the total flux (the theoretical 
maximum of the flux) is the following 



L2A^ 



-D. 



If the number of reaction steps is equal to the dimen- 
sion of enzymes the optimal flux depends on van der 
Waals parameters only: 



B^ 
2A' 
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Reviewers' report 

Reviewer 1: Sandor Pongor, International Centre for Genetic Engineering and 
Biotechnology, Trieste, Italy 

Szathmary and coworkers seek to answer the question re when complex 
metabolism could have originated in the course of evolution. The question is 
highly relevant, and to the best of my knowledge it has not been tackled by 
other studies. Timing in relation to the major evolutionary transitions is an 
original and elegant idea that is especially suited for modeling studies. The 
authors propose that specialist enzymes emerged after the appearance of 
the chromosome because protocells harbouring unlinked genes maintain 
largely aspecific enzymes to reduce their assortment load. 
The authors attack the problem using an elegant model of a population of 
protocells. The presentation of the model is clear and straightforward, and 
the authors show that the model is robust in the sense that some changes 
in the methodology do not affect the qualitative outcome of the 
simulations. This is where I would like to raise my first question. Metabolism 
implemented in the paper is based on a linear set of reactions. While there 
are linear anabolic pathways (e.g. fatty acid synthesis), many of the 
supposedly ancient pathways have more complex topology. Do the results 
of the model change if different topologies, in particular the autocatalytic 
cycle, are also considered? 
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The underlying equations of fluxes will not qualitatively change if we change 
the topology of the reaction network. Thus our result will be qualitatively the 
same for any topology 

As my background is in biochemistry, I cannot resist asking questions regarding 
the nature of enzyme-substrate interactions. The authors admit that their 
representation of this interaction is a rather abstract one. I agree that such a 
representation is adequate for the question at hand. Nevertheless, it should be 
discussed in some depth, the consequences of the abstraction, in comparison 
to "real" enzyme-substrate interactions. While Kacser and Beeby employed 3D 
blocks in their cited study, Szilagyi et al. here assume hypercubes of n>3 
dimensions. What is the precise meaning of these dimensions? Furthermore, 
more realistic descriptions, used for instance in classical molecular dynamics, 
apply a variety of explicitly described molecular interaction types. Why did the 
authors choose the Lennard-Jones potential, and would it make a difference if 
other interactions would also be considered? 

The active site of an enzyme is a complex cavity, where the relative positions of 
a number of atoms are key to successful catalysis. Such positioning can only be 
described by more than 3 values. In reality the abstract dimensions we 
employed would translate to distances and angles between side chains/atoms 
participating in catalysis. In a similar vein any potential function that has one 
minimum would lead to the same qualitative results as our model, because 
only the existence of a perfect fit matters here. Thus we could make the model 
more complex, although it would not alter the qualitative results, but such 
complexity might blur our clear message by too much technicality. 
The results are nicely presented and the underlying mechanism adequately 
discussed. The mention of regulation in the outlook is very important, there 
is often much talk about enzyme catalysis, but less about the regulation and 
coordination required for a truly complex metabolism. That notwithstanding, 
I missed a discussion of minimal metabolism in the paper. Namely, the 
metabolic complexity required at different stages of the evolution of life sets 
a minimum requirement on the number of reactions needed. One would 
expect that the invention of chromosome would also lead to new enzyme 
functions. I would like that the authors discuss this matter in the paper. 
We now discuss the minimal number of enzymes required for a minimal 
protocell with and without chromosome. 

In summary, I consider this paper will be a welcome addition to the field, 
and warmly recommend for publication in Biology Direct. 
We are grateful for your useful comments, that helped us to improve the manuscript 
Reviewer 2: Rob Knight, University of Colorado Boulder, USA 
In this paper, the authors address the question of the relative order in which 
enzymes with high specificity evolved relative to the evolution of 
chromosomes, fitting these two events into their "major transitions in 
evolution" framework. They accomplish this by modeling enzyme evolution 
according to a block-and-cavity model previously and successfully used for 
other studies, implemented in two versions: one with ribozymes unlinked, 
and one with ribozymes linked into chromosomes. Essentially, the model 
proto-cells either divide once a threshold concentration of the chromosome 
is reached, or once enough independently replicating RNA enzymes reach 
sufficiently high concentration (but the daughter cells might not have all the 
ribozymes). The ribozymes are initially fully general but specialize during the 
simulation. In the case without chromosomes, the ribozymes remain 
unspecialized, whereas in the case with chromosomes the ribozymes rapidly 
specialize to carry out specific reactions. The interpretation is that 
chromosomes allow specialization because each ribozyme can then 
guarantee co-occurrence with other, specialized ribozymes. 
This work is interesting in that such a clear result, that linkage of functions 
drives specialization, arises from such a simple, abstract model. I do have 
some concerns about the generality of the conclusions reached. For 
example, some other assortment mechanism than chromosomes that would 
also result in physical partition, for example hybridization of complementary 
regions or ability to bind a common substrate (e.g. through accessory 
aptamer domains, or through "zip code"-style packaging signals and 
apparatus) would be formally equivalent in the model, yet would imply a 
very different pathway of evolution with respect to chromosomes 
specifically. Additionally, it might be interesting to explore the implications 
of linkage for parasitism of the system by non-functional RNAs. 
Thank you very much for this comment As you mentioned our conclusion will 
not change if other modes of linkage are considered. Once linkage allows the 
evolution of a more complex metabolism other linkage mechanism could also 
be explored. Thus, any particular molecular mechanism of linkage suffices, and 
can give rise to the ligation-based linkage assumed in the chromosome. 



The work of Briones et al. doi: 10.1 261 /ma. 1488609 on the evolution of 
modular RNAs versus single large RNAs is also relevant and should perhaps 
be discussed. 

Briones et al. [24] elegantly demonstrate that the structural diversity of RNA 
molecules can be extended by the ligation of randomly formed strands. It opens 
up the possibility of gradual increase in complexity. That study deals with a 
prior stage in the origin of life, the one leading to a replicase which - we claim 
- is a prerequisite for the (proto-) cellular stage. 

The equations were missing symbols (notably sigma signs) in the version I 
reviewed, and this needs to be corrected before publication, along with the 
language errors noted below. 
We have corrected these errors. 

Overall, I believe this is a valuable contribution to the literature that, with 

appropriate cautionary notes about the limits of what the model can define, 

will be of interest to those studying the origins of modern life. 

Reviewer 3; Caspar Jekely, Max Plank Institute for Marine Biology, Tubingen, 

Cermany 

In this paper Szilagyi and colleagues convincingly demonstrate that the 
origin of chromosomes must have preceded the origin of efficient specialist 
enzymes. 

Such an important conclusion can only be reached by the rigorous 
numerical simulations (and not by speculation alone) that characterize the 
work of Szathmary's group. 

The paper is clearly written, and it is shown that the conclusions are robust 
to changes in the parameters. I have a few comments that I hope the 
authors can address in a revised version. 

First, Szilagyi and colleagues consider only the two extremes of linkage (all 
or none), which leaves open the question if more specialized enzymes could 
have been maintained by limited linkage. One can imagine that initially it 
was only pairs or small numbers of RNA genes that were linked. Would a cell 
with 2-gene chromosomes be able to outcompete a cell with no linkage 
and a cell with 3-gene chromosomes a cell with 2-gene chromosomes (and 
so on)? Demonstrating such graduality in the origin of chromosomes could 
provide a further valuable aspect to the model. 

We have included another version of the model that represents a transitional 
state connecting the fully independently replicating genes and the fully formed 
chromosome with controlled segregation (see revised Method section). We 
demonstrate that linkage can go to fixation and linkage of genes in a 
chromosome is enough for full specialization, even without controlled 
segregation. Incidentally, this echoes the 1993 model of Maynard Smith and 
Szathmdry that did not model enzyme evolution, however We are very grateful 
for the comment and we hope we were able to demonstrate that the transition 
from one system to the next is also possible. 

Second, the authors may consider discussing the issue of how the origin of 
efficient replicators relates to the origin of linkage. Since replication of 
chromosomes also requires efficient specialist enzymes (e.g. a primase, a 
replicase and a helicase), their origin must have also been influenced by 
assortment load. If the efficient replication of longer chromosomes requires 
multiple specialist enzymes, that can only evolve once chromosomes have 
appeared, this presents another error threshold-type problem. 
We agree that an error-threshold-like problem unfolds with independently 
replicating genes, apart from the one stemming from the mutational load: 
random assortment causes loss of genes, which can be tolerated to certain 
extent [20], but limits the number of genes that can coexist However, 
replicating a chromosome or individual genes requires the same set of enzymes. 
In essence, in the unlinked system there are as many chromosomes as there are 
genes (and each chromosome can be in multiple copies). Thus if we assume 
that the system with individually replicating genes can exist (which is an 
interesting question in its own right!), then the one with linkage does not need 
significantly more enzymes (see our discussion). 
In the discussion the authors write that 'The fidelity of the replication 
process as well as the control mechanisms that guide the segregation of the 
chromosome evolved at this stage of the origin of life. Diversified, complex 
metabolism evolved afterwards". Civen that the fidelity of replication and the 
control mechanisms that guide chromosome segregation presumable also 
depended on specialist enzymes, I am wondering if all these properties may 
have rather coevolved with linkage. 

We agree that the complexity of metabolism coevolved with the genetic 
representation. Our results do not imply that no specific enzyme could evolve, 
only that given the possibility of generalist enzymes, evolution will not opt for 
them. Furthermore, a good replicase is a generalist enzyme, as it should take 



Szilagyi et al. Biology Direct 2012, 7:38 
http://www.biology-direct.conn/content/7/1/38 



Page 10 of 10 



0.1 



X 

3 



0.01 



1E-3 



1x10^ 



2x10^ 3x10^ 4x10^ 5x10^ 
Time 

Figure 6 The evolution of mean direct flux in the 3-step 
reaction network with individually replicating genes. Other 
parameters are as in Figure 2. 



many kinds of substrate (different sequences) and copy them. At the same time 
it should also be an efficient enzyme, as it should work with high fidelity in 
replication. Higher efficiency measured as flux, evolves in the simpler system as 
well (Figure 6). 

The above considerations boil down to the question: rather than taking 
linkage as given, could selection for more efficient enzymes have driven the 
gradual origin of linkage? 

We now demonstrate that, given the possibility of linkage, the higher efficiency 
attainable drives the system toward the fixation of chromosomes and in turn to 
full specialization. 
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